Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 325
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 54, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365677

RESUMO

BACKGROUND: Electronic health records (EHRs) contain valuable information for clinical research; however, the sensitive nature of healthcare data presents security and confidentiality challenges. De-identification is therefore essential to protect personal data in EHRs and comply with government regulations. Named entity recognition (NER) methods have been proposed to remove personal identifiers, with deep learning-based models achieving better performance. However, manual annotation of training data is time-consuming and expensive. The aim of this study was to develop an automatic de-identification pipeline for all kinds of clinical documents based on a distant supervised method to significantly reduce the cost of manual annotations and to facilitate the transfer of the de-identification pipeline to other clinical centers. METHODS: We proposed an automated annotation process for French clinical de-identification, exploiting data from the eHOP clinical data warehouse (CDW) of the CHU de Rennes and national knowledge bases, as well as other features. In addition, this paper proposes an assisted data annotation solution using the Prodigy annotation tool. This approach aims to reduce the cost required to create a reference corpus for the evaluation of state-of-the-art NER models. Finally, we evaluated and compared the effectiveness of different NER methods. RESULTS: A French de-identification dataset was developed in this work, based on EHRs provided by the eHOP CDW at Rennes University Hospital, France. The dataset was rich in terms of personal information, and the distribution of entities was quite similar in the training and test datasets. We evaluated a Bi-LSTM + CRF sequence labeling architecture, combined with Flair + FastText word embeddings, on a test set of manually annotated clinical reports. The model outperformed the other tested models with a significant F1 score of 96,96%, demonstrating the effectiveness of our automatic approach for deidentifying sensitive information. CONCLUSIONS: This study provides an automatic de-identification pipeline for clinical notes, which can facilitate the reuse of EHRs for secondary purposes such as clinical research. Our study highlights the importance of using advanced NLP techniques for effective de-identification, as well as the need for innovative solutions such as distant supervision to overcome the challenge of limited annotated data in the medical domain.


Assuntos
Aprendizado Profundo , Humanos , Anonimização de Dados , Registros Eletrônicos de Saúde , Análise Custo-Benefício , Confidencialidade , Processamento de Linguagem Natural
2.
J Nerv Ment Dis ; 212(1): 2-3, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38166181

RESUMO

ABSTRACT: The field of psychiatry has been limited in its use of patient videos for educational purposes because essential facial information must be obscured to protect patient privacy, confidentiality, and dignity. This article calls attention to emerging technologies for deidentification of patients in video recordings while still preserving facial expression. Fully anonymized videos could be used to augment the education of psychiatric residents and for continuing education of the psychiatric workforce. This article suggests projects that deidentification technology could make possible; it also outlines some complex problems that would need to be addressed before the field could use this potentially transformative technology.


Assuntos
Confidencialidade , Anonimização de Dados , Humanos , Gravação em Vídeo , Escolaridade , Tecnologia
3.
Stud Health Technol Inform ; 310: 1456-1457, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269694

RESUMO

To extract information from free-text in clinical records due to the patient's protected health information PHI in the records pre-processing of de-identification is required. Therefore we aimed to identify PHI list and fine-tune the deep learning BERT model for developing de-identification model. The result of fine-tuning the model is strict F1 score of 0.924. Due to the convinced score the model can be used for the development of a de-identification model.


Assuntos
Anonimização de Dados , Aprendizado Profundo , Humanos , República da Coreia
4.
JAMA Surg ; 159(1): 104-105, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-37878296

RESUMO

This article reviews the implementation of standards for surgical video deidentification.


Assuntos
Confidencialidade , Anonimização de Dados , Humanos , Gravação em Vídeo
5.
J Genet Genomics ; 51(2): 243-251, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37714454

RESUMO

The growth in biomedical data resources has raised potential privacy concerns and risks of genetic information leakage. For instance, exome sequencing aids clinical decisions by comparing data through web services, but it requires significant trust between users and providers. To alleviate privacy concerns, the most commonly used strategy is to anonymize sensitive data. Unfortunately, studies have shown that anonymization is insufficient to protect against reidentification attacks. Recently, privacy-preserving technologies have been applied to preserve application utility while protecting the privacy of biomedical data. We present the PICOTEES framework, a privacy-preserving online service of phenotype exploration for genetic-diagnostic variants (https://birthdefectlab.cn:3000/). PICOTEES enables privacy-preserving queries of the phenotype spectrum for a single variant by utilizing trusted execution environment technology, which can protect the privacy of the user's query information, backend models, and data, as well as the final results. We demonstrate the utility and performance of PICOTEES by exploring a bioinformatics dataset. The dataset is from a cohort containing 20,909 genetic testing patients with 3,152,508 variants from the Children's Hospital of Fudan University in China, dominated by the Chinese Han population (>99.9%). Our query results yield a large number of unreported diagnostic variants and previously reported pathogenicity.


Assuntos
Anonimização de Dados , Privacidade , Criança , Humanos , Biologia Computacional , Testes Genéticos , Fenótipo
6.
J Med Internet Res ; 25: e48145, 2023 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-38055317

RESUMO

BACKGROUND: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning-based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. OBJECTIVE: The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. METHODS: In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. RESULTS: The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. CONCLUSIONS: The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline.


Assuntos
Anonimização de Dados , Registros Eletrônicos de Saúde , Humanos , Austrália , Algoritmos , Hospitais de Ensino
7.
Rev. Hosp. Ital. B. Aires (En línea) ; 43(4): 174-180, dic. 2023. ilus, tab
Artigo em Espanhol | LILACS, UNISALUD, BINACIS | ID: biblio-1532111

RESUMO

Introducción: durante la pandemia de COVID-19 hubo un auge sin precedentes de la telemedicina, probablemente por la forzada adopción de tecnología ante las medidas restrictivas. El presente estudio se propuso comparar la interacción y la comunicación entre médicos de cabecera (MC) y pacientes, antes y durante el período de pandemia, en términos de consultas ambulatorias programadas y mensajes del Portal de Salud. Materiales y métodos: corte transversal con muestreo consecutivo de turnos programados y mensajes, ocurridos entre las semanas epidemiológicas (SE) 10 y 23, de 2019 y 2020, respectivamente. Se incluyeron 147 médicos del Servicio de Medicina Familiar y Comunitaria, y una cápita de 73 427 pacientes afiliados al Plan de Salud del Hospital Italiano de Buenos Aires. Se realizó análisis cuantitativo y cualitativo. Resultados: hubo una reducción del 70% de las consultas presenciales (de 76 375 en 2019 a 23 200 en 2020) y un aumento concomitante de teleconsultas (de 255 en la SE13 a 1089 en la SE23). En simultáneo, los mensajes aumentaron sustancialmente (de 28 601 en 2019 a 84 916 en 2020), con un inicio abrupto al comienzo del confinamiento, y una tendencia decreciente a lo largo del tiempo. Antes de la pandemia, el contenido estuvo relacionado con órdenes electrónicas de estudios complementarios, control de resultados, recetas de medicación crónica y/o interconsultas a especialistas, mientras que los dominios más frecuentes durante la pandemia fueron necesidades informativas epidemiológicas, como medidas preventivas para COVID-19, vacuna antineumocócica, vacuna antigripal, casos o sospechas, resultados de hisopados, entre otras. Conclusión: el auge de las tecnologías de la comunicación e información durante la pandemia permitió dar continuidad a los procesos asistenciales en salud pese al distanciamiento físico. Hubo mayor utilización de mensajería por necesidades informativas de los pacientes, y la relación médico-paciente se ha modificado. (AU)


Introduction: during the COVID-19 pandemic, there was an unprecedented boom in telemedicine, probably due to the forced adoption of technology in the face of restrictive measures. This study aimed to compare the interaction and communication between general practitioners and patients before and during the pandemic based on scheduled outpatient consultations and Health Portal messages. Materials and methods: Cross-sectional study with a consecutive sampling of scheduled appointments and messages, occurring between epidemiological weeks (EW) 10 and 23 of 2019 and 2020, respectively. We included 147 physicians from the Family and Community Medicine Service and a capita of 73427 patients affiliated with the Hospital Italiano de Buenos Aires health plan. We conducted a quantitative and qualitative analysis. Results: there was a 70% reduction in face-to-face consultations (from 76375 in 2019 to 23200 in 2020) and a concomitant increase in teleconsultations (from 255 in EW13 to 1089 in EW23). Concurrently, messages increased substantially (from 28601 in 2019 to 84916 in 2020), with an abrupt onset at the beginning of confinement and a decreasing trend over time. Before the pandemic, the content involved electronic orders for complementary studies, outcome monitoring, chronic medication prescriptions, or expert consultations. The most frequent domains during the pandemic were epidemiological information needs, such as preventive measures for COVID-19, pneumococcal vaccine, influenza vaccine, cases or suspicions, and swab results, among others. Conclusion: the rise of communication and information technologies during the pandemic allowed the continuity of healthcare processes despite the physical distance. There was increased use of messaging for patients' information needs, and the doctor-patient relationship has changed. (AU)


Assuntos
Humanos , Atenção Primária à Saúde/métodos , Consulta Remota/estatística & dados numéricos , Assistência Ambulatorial/métodos , Relações Médico-Paciente , Estudos Transversais , Correio Eletrônico , Comunicação em Saúde , Anonimização de Dados , COVID-19
8.
BMC Med Res Methodol ; 23(1): 258, 2023 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-37925415

RESUMO

BACKGROUND: Subject-level real-world data (RWD) collected during daily healthcare practices are increasingly used in medical research to assess questions that cannot be addressed in the context of a randomized controlled trial (RCT). A novel application of RWD arises from the need to create external control arms (ECAs) for single-arm RCTs. In the analysis of ECAs against RCT data, there is an evident need to manage and analyze RCT data and RWD in the same technical environment. In the Nordic countries, legal requirements may require that the original subject-level data be anonymized, i.e., modified so that the risk to identify any individual is minimal. The aim of this study was to conduct initial exploration on how well pseudonymized and anonymized RWD perform in the creation of an ECA for an RCT. METHODS: This was a hybrid observational cohort study using clinical data from the control arm of the completed randomized phase II clinical trial (PACIFIC-AF) and RWD cohort from Finnish healthcare data sources. The initial pseudonymized RWD were anonymized within the (k, ε)-anonymity framework (a model for protecting individuals against identification). Propensity score matching and weighting methods were applied to the anonymized and pseudonymized RWD, to balance potential confounders against the RCT data. Descriptive statistics for the potential confounders and overall survival analyses were conducted prior to and after matching and weighting, using both the pseudonymized and anonymized RWD sets. RESULTS: Anonymization affected the baseline characteristics of potential confounders only marginally. The greatest difference was in the prevalence of chronic obstructive pulmonary disease (4.6% vs. 5.4% in the pseudonymized compared to the anonymized data, respectively). Moreover, the overall survival changed in anonymization by only 8% (95% CI 4-22%). Both the pseudonymized and anonymized RWD were able to produce matched ECAs for the RCT data. Anonymization after matching impacted overall survival analysis by 22% (95% CI -21-87%). CONCLUSIONS: Anonymization may be a viable technique for cases where flexible data transfer and sharing are required. As anonymization necessarily affects some aspects of the original data, further research and careful consideration of anonymization strategies are needed.


Assuntos
Pesquisa Biomédica , Anonimização de Dados , Humanos , Pesquisa Biomédica/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto , Ensaios Clínicos Fase II como Assunto
9.
Cas Lek Cesk ; 162(2-3): 61-66, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37474288

RESUMO

Healthcare data held by state-run organisations is a valuable intangible asset for society. Its use should be a priority for its administrators and the state. A completely paternalistic approach by administrators and the state is undesirable, however much it aims to protect the privacy rights of persons registered in databases. In line with European policies and the global trend, these measures should not outweigh the social benefit that arises from the analysis of these data if the technical possibilities exist to sufficiently protect the privacy rights of individuals. Czech society is having an intense discussion on the topic, but according to the authors, it is insufficiently based on facts and lacks clearly articulated opinions of the expert public. The aim of this article is to fill these gaps. Data anonymization techniques provide a solution to protect individuals' privacy rights while preserving the scientific value of the data. The risk of identifying individuals in anonymised data sets is scalable and can be minimised depending on the type and content of the data and its use by the specific applicant. Finding the optimal form and scope of deidentified data requires competence and knowledge on the part of both the applicant and the administrator. It is in the interest of the applicant, the administrator, as well as the protected persons in the databases that both parties show willingness and have the ability and expertise to communicate during the application and its processing.


Assuntos
Confidencialidade , Anonimização de Dados , Humanos , Privacidade
10.
BMC Res Notes ; 16(1): 98, 2023 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-37280717

RESUMO

OBJECTIVE: Survival models are used extensively in biomedical sciences, where they allow the investigation of the effect of exposures on health outcomes. It is desirable to use diverse data sets in survival analyses, because this offers increased statistical power and generalisability of results. However, there are often challenges with bringing data together in one location or following an analysis plan and sharing results. DataSHIELD is an analysis platform that helps users to overcome these ethical, governance and process difficulties. It allows users to analyse data remotely, using functions that are built to restrict access to the detailed data items (federated analysis). Previous works have provided survival modelling functionality in DataSHIELD (dsSurvival package), but there is a requirement to provide functions that offer privacy enhancing survival curves that retain useful information. RESULTS: We introduce an enhanced version of the dsSurvival package which offers privacy enhancing survival curves for DataSHIELD. Different methods for enhancing privacy were evaluated for their effectiveness in enhancing privacy while maintaining utility. We demonstrated how our selected method could enhance privacy in different scenarios using real survival data. The details of how DataSHIELD can be used to generate survival curves can be found in the associated tutorial.


Assuntos
Ciência de Dados , Modelos Estatísticos , Privacidade , Análise de Sobrevida , Confidencialidade , Ciência de Dados/métodos , Anonimização de Dados , Análise de Dados , Ética em Pesquisa
11.
Rev. derecho genoma hum ; (58): 15-41, Ene.-jun. 2023.
Artigo em Espanhol | IBECS | ID: ibc-231269

RESUMO

Se pretende analizar la necesidad de codificar los datos de los participantes de un estudio de salud, así como las técnicas que se pueden emplear como medida de protección, analizando sus características, ventajas e inconvenientes y abordándose desde un punto de vista semi-práctico, al desarrollarse brevemente algunas técnicas de codificación. (AU)


Te aim is to analyse the need to code the data of the participants of a health study, as well as the techniques that can be used to do so, analysing their characteristics, advantages and disadvantages and approaching it from a semi-practical point of view, by briefly developing some coding techniques. (AU)


Assuntos
Humanos , Segurança Computacional/instrumentação , Segurança Computacional/tendências , Anonimização de Dados , Pesquisa Biomédica/ética , Ética em Pesquisa , Estudos Clínicos como Assunto
12.
Neuroinformatics ; 21(3): 575-587, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37226013

RESUMO

Head CT, which includes the facial region, can visualize faces using 3D reconstruction, raising concern that individuals may be identified. We developed a new de-identification technique that distorts the faces of head CT images. Head CT images that were distorted were labeled as "original images" and the others as "reference images." Reconstructed face models of both were created, with 400 control points on the facial surfaces. All voxel positions in the original image were moved and deformed according to the deformation vectors required to move to corresponding control points on the reference image. Three face detection and identification programs were used to determine face detection rates and match confidence scores. Intracranial volume equivalence tests were performed before and after deformation, and correlation coefficients between intracranial pixel value histograms were calculated. Output accuracy of the deep learning model for intracranial segmentation was determined using Dice Similarity Coefficient before and after deformation. The face detection rate was 100%, and match confidence scores were < 90. Equivalence testing of the intracranial volume revealed statistical equivalence before and after deformation. The median correlation coefficient between intracranial pixel value histograms before and after deformation was 0.9965, indicating high similarity. Dice Similarity Coefficient values of original and deformed images were statistically equivalent. We developed a technique to de-identify head CT images while maintaining the accuracy of deep-learning models. The technique involves deforming images to prevent face identification, with minimal changes to the original information.


Assuntos
Anonimização de Dados , Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Cabeça/diagnóstico por imagem , Algoritmos
13.
Stud Health Technol Inform ; 302: 28-32, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203603

RESUMO

Data sharing provides benefits in terms of transparency and innovation. Privacy concerns in this context can be addressed by anonymization techniques. In our study, we evaluated anonymization approaches which transform structured data in a real-world scenario of a chronic kidney disease cohort study and checked for replicability of research results via 95% CI overlap in two differently anonymized datasets with different protection degrees. Calculated 95% CI overlapped in both applied anonymization approaches and visual comparison presented similar results. Thus, in our use case scenario, research results were not relevantly impacted by anonymization, which adds to the growing evidence of utility-preserving anonymization techniques.


Assuntos
Anonimização de Dados , Privacidade , Humanos , Estudos de Coortes , Disseminação de Informação , Organizações
14.
PLoS One ; 18(4): e0285212, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37115783

RESUMO

Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals' private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.


Assuntos
Big Data , Software , Humanos , Anonimização de Dados , Algoritmos , Biologia Computacional
15.
Sensors (Basel) ; 23(8)2023 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-37112425

RESUMO

Health equipment are used to keep track of significant health indicators, automate health interventions, and analyze health indicators. People have begun using mobile applications to track health characteristics and medical demands because devices are now linked to high-speed internet and mobile phones. Such a combination of smart devices, the internet, and mobile applications expands the usage of remote health monitoring through the Internet of Medical Things (IoMT). The accessibility and unpredictable aspects of IoMT create massive security and confidentiality threats in IoMT systems. In this paper, Octopus and Physically Unclonable Functions (PUFs) are used to provide privacy to the healthcare device by masking the data, and machine learning (ML) techniques are used to retrieve the health data back and reduce security breaches on networks. This technique has exhibited 99.45% accuracy, which proves that this technique could be used to secure health data with masking.


Assuntos
Telefone Celular , Octopodiformes , Humanos , Animais , Anonimização de Dados , Alimentos Marinhos , Aprendizado de Máquina
16.
Int J Med Inform ; 173: 105021, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36870249

RESUMO

INTRODUCTION: Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes. METHODS: Four tools were selected: three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance. RESULTS: Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives. CONCLUSION: Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.


Assuntos
Registros Eletrônicos de Saúde , Medicina Geral , Humanos , Confidencialidade , Anonimização de Dados , Austrália , Processamento de Linguagem Natural
17.
Lancet Digit Health ; 5(4): e239-e247, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36797124

RESUMO

Wearable devices have made it easier to generate and share data collected on individuals. This systematic review seeks to investigate whether deidentifying data from wearable devices is sufficient to protect the privacy of individuals in datasets. We searched Web of Science, IEEE Xplore Digital Library, PubMed, Scopus, and the ACM Digital Library on Dec 6, 2021 (PROSPERO registration number CRD42022312922). We also performed manual searches in journals of interest until April 12, 2022. Although our search strategy had no language restrictions, all retrieved studies were in English. We included studies showing reidentification, identification, or authentication with data from wearable devices. Our search retrieved 17 625 studies, and 72 studies met our inclusion criteria. We designed a custom assessment tool for study quality and risk of bias assessments. 64 studies were classified as high quality and eight as moderate quality, and we did not detect any bias in any of the included studies. Correct identification rates were typically 86-100%, indicating a high risk of reidentification. Additionally, as little as 1-300 s of recording were required to enable reidentification from sensors that are generally not thought to generate identifiable information, such as electrocardiograms. These findings call for concerted efforts to rethink methods for data sharing to promote advances in research innovation while preventing the loss of individual privacy.


Assuntos
Anonimização de Dados , Dispositivos Eletrônicos Vestíveis , Humanos , Confidencialidade , Privacidade
18.
J Res Adolesc ; 33(1): 141-153, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-35860849

RESUMO

The present study examined whether declines in religiousness across adolescence precede religious deidentification in young adulthood. Data came from the National Study of Youth and Religion. Participants were religiously affiliated for the first three waves of the longitudinal study (N = 1144). Latent growth curve models found significant declines across adolescence in church attendance, prayer, scripture study, religious importance, and spirituality, whereas doubt was stable across time. Then, logistic regression models specified the latent intercepts and slopes as predictors of later (Wave 4) deidentification. Significant negative links were found for the intercepts and slopes on church attendance, prayer, scripture study, religious importance, and spirituality. For doubt, a significant, positive link was found for the intercept.


Assuntos
Anonimização de Dados , Religião , Humanos , Adolescente , Adulto Jovem , Adulto , Estudos Longitudinais , Espiritualidade , Modelos Logísticos
19.
J Am Med Inform Assoc ; 30(2): 318-328, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36416419

RESUMO

OBJECTIVE: To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates "hiding in plain sight." MATERIALS AND METHODS: In this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests. RESULTS: Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span. DISCUSSION: Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports. CONCLUSIONS: A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.


Assuntos
Anonimização de Dados , Radiologia , Humanos , Estudos Retrospectivos , Algoritmos , Instalações de Saúde , Processamento de Linguagem Natural
20.
Int J Popul Data Sci ; 8(1): 2153, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38414537

RESUMO

Introduction: Using data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data. Methods: We adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence. Results: The initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning. Conclusion: Our review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.


Assuntos
Confidencialidade , Registros de Saúde Pessoal , Anonimização de Dados , Registros Eletrônicos de Saúde , Health Insurance Portability and Accountability Act , Literatura de Revisão como Assunto , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...